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ABSTRACT. 

The rapid advancement of the internet and its exponentially increasing usage has 
also exposed it to several vulnerabilities. Consequently, it has become an extremely 
important that can prevent network security issues. One of the most commonly 
implemented solutions is Intrusion Detection System (IDS) that can detect unusual 
attacks and unauthorized access to a secured network. In the past, several machine 
learning algorithms have been evaluated on the KDD intrusion dataset. However, this 
paper focuses on the implementation of the four machine learning algorithms: KNN, 
Random Forest, gradient boosted tree and decision tree. The models are also 
implemented through the Auto Model feature to determine its convenience. The results 
show that Gradient Boosted trees have achieved the highest accuracy (99.42%) in 
compatison to random forest algorithm that achieved the lowest accuracy (93.63%). 

Keywords: Intrusion detection system; Machine learning; RapidMiner; NSL-KDD and 
Gradient boosted tree. 
1, INTRODUCTION 

The complexities of cyber-attacks are increasing with time and consequently, 
their malice too. In today’s world, every networked environment must take high-level 
secutity measures to ensure secure and reliable communication between several 
organizations. Software or device that inspects traffic of a network for any violation or 
malicious activity is termed as an intrusion detection system (IDS). This safeguard system 
is placed at one or more strategic points in a network to detect suspicious activity [1]. All 
the traffic from and to the devices, connected to the network is analyzed and the activities 
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are matched to the known attacks. Any violation or malicious activity is collected centrally 
and reported through a piece of Security Information and Event Management System 
(SIEM). 

The need for an intrusion detection system is undeniable; thus, an accurate model 
must be developed. In this field, machine learning has proven to be an effective 
investigation device that can detect any irregular event taking place in any system’s traffic 
[2]. Various models based on machine learning algorithms have been used for the 
detection of cyber-attacks [3]. Many researchers have studied machine learning 
algorithms and implemented them on NSL-KDD data set to enhance the performance 
of cyber-attack detection mechanisms. These include Artificial Neural Networks, Naive- 
Bayesian algorithms, self-organizing maps, Support Vector Machines, Random Forests, 
and much more [4]. Salama ef a/ 2019 applied a restricted Boltzmann machine (RBM) on 
the same dataset for feature reduction and then implemented a support vector machine 
(SVM) as a classifier with an approximate accuracy of 87% [5]. 

Amreen et a/. [6] proposed an Intelligent Network IDS using Average One 
Dependence Estimators (AODE) which is an improved variant of Naive Bayes. The 
researchers also established a network anomaly detection system with the help of 
discriminative RBM in combination with generative models. This system provided 
reasonable accuracy for gathering information from training data[7]. Solane Duque et al. 
built a model with a k-means machine learning algorithm and observed a high-efficiency 
rate along with low false negative and positive rates. It was implied that this algorithm 
could be implemented on a signature-based approach to lessen the false-negative rate. It 
was observed that random forest classifier provided better performance as compared to 
other algorithms [8]. Furthermore, the researchers evaluated eight tree-based 
classification algorithms to predict network attacks[9]. In addition to this, the researchers 
have also worked on the spots where the performance of IDS can be improved using the 
deep learning models. Shone presented a non-symmetric deep autoencoder (NDAE) 
based on a deep learning technique for unsupervised feature-based learning [10]. Another 
novel approach was proposed by Tao Ma ef a/. known as SCDNN in which a deep neural 
network (DNN) was implemented along with Spectral Clustering (SC) [11]. 

A recent approach to the implementation of the machine learning algorithms is 
integrated into environments like WEKA, Knime, Orange, Keel, Azure, IBM SPSS 
Modeler, and Scikit-Learn[12]. 

This research explores the usability of Rapid Miner for the implementation of 
machine learning algorithms for IDS. This paper aims to implement the machine learning 
algorithms on Rapid-Miner. This method will also determine the ease of utilizing a built 
platform for data science tools as well as to evaluate five different classifiers on NSL- 
KDD as given in Figure 1. 
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Figure 1. General Flowchart for Classification algorithms 
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Data Mining Tool 
In this paper, models for the classification algorithms were developed on Rapid- 
Miner using the NSL-KDD dataset to evaluate their performance. Rapid-Miner is a 
platform that offers an integrated environment that allows to perform the data 
preparation and pre-processing, text mining, predictive analytics, machine learning, and 
deep learning. The ROI-centric approach-based software allows the users to conveniently 
build and test models using block coding [13]. The end-to-end data science platform 
offers a wide range of processes that work together seamlessly. 
Dataset 
The classification algorithms were evaluated on the NSL-KDD dataset. The 
dataset can be downloaded from the website of the University of New Brunswick. One 
of the earlier datasets used to develop an IDS and a predictive model to differentiate 
between intrusions and normal connections is KDD’99 that is obtained as a result of the 
Knowledge Discovery and Data Mining Tools Competition (KDD cup) [14]. A cleaned- 
up and revised version of KDD’99 has been built, known as Network Security 
Laboratory- Knowledge Discovery and Data Mining Tools (NSL-KDD) data set. 
The new dataset has resolved some intrinsic issues of its predecessor, yet it is not 
a perfect depiction of current real networks[15]. It is due to insufficient public data sets 
for IDSs built for networks. However, the dataset can be employed as an effectual 
standard data set by the researchers to classify different intrusion detection methods. It is 
mainly because the records for the train and test datasets have reasonable examples, 
providing comparable and consistent research evaluation results. Moreover, the training 
set does not include redundant examples and thus, the bias for frequent examples can be 
avoided. In addition to this, duplicate examples are not included, preventing the bias 
towards the methods that offer better rates of detection on the frequent examples. From 
each difficulty level group, some examples are selected which are inversely proportional 
to the percentage of examples in the previous KDD dataset mentioned in Table 1. As a 
result, NSL-KDD accurately evaluates different learning algorithms. 
Table 1. Dataset Description 
Classes Subclasses 
apache2 
back 
land 
neptune 
mailbomb 
pod 
process table 
smurf 
teardrop 
udpstorm 
worm 
ipsweep 
mscan 
nmap 
portsweep 
saint 
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U2R buffer_overflow 
loadmodule 
perl 

ps 

rootkit 
sqlattack 
xterm 
ftp_write 
guess_passwd 
httptunnel 
imap 
multihop 
named 

phf 

sendmail 

9, snmpgetattack 
10. spy 

11. snmpguess 

12. warezclient 
13. wartezmaster 
14. xlock 

15. xsnoop 


S, 
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Classification Algorithms 
The machine algorithms implemented in this paper are described below: 
Random Forest. 

The random forest contains a significant number of decision trees that could 
perform classification individually and a most voted class is deemed as a prediction of the 
model. Using random forest, the accuracy can be improved as it utilizes the classification 
power of several trees. However, the key point is to form low correlated decision trees 
within the random forest. Otherwise, the error of the individual decision trees can add up 
and classify inaccurately. For this purpose, feature randomness and bagging to build 
uncorrelated decision trees that can provide high accuracy as given in Table 2. 

Table 2. Model Parameters for RF 


Parametets No. of Criterion Maximal 
trees Depth 
Values 100 Gini 10 
index 


KNN. Another commonly used machine learning algorithm is K-Nearest Neighbor that 
uses multiclass data to predict the class for a new sample. The classifier calculates the 
distance of the new sample point to all other existing points. It classifies the new sample 
point based on its closest neighbor in the dataset is given in Table 3. 
Table 3. Model Parameters for KNN 
Parameters K 
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Values 3 
Gradient Boost Tree (GBT). Gradient Boost tree is an ensemble method that uses 
decision trees that are linked in series where every tree tries to minimize e the error 
generated by the previous three. Gradient boost tree is a greedy ML algorithm, so to reduce 
overfitting. Regularization methods are employed to penalize different components of the 
algorithm [16]. Even though, the sequential algorithm takes longer to learn, it offers high 
accuracy for classification problems. The model parameters of GBT are shown in Table 4. 
Table 4. Model Parameters for GBT 


Parameters N M M L Sample rate 
O. axi in eat 
of ma fo ni 
tre ] ws ng 
es De rat 
pt e 
h 
Values 5 1 1 0. 1.0 
0 0 0 01 


Decision Tree. One of the supervised machine learning algorithms is a decision tree that 
is used for both classification and regression problems. The algorithm utilizes tree 
representation where a leaf node denotes a class label, while the attributes are signified on 
the tree’s internal node (Table 5). 

Table 5. Model Parameters for DT 


Parameters Criterion Maximal 
Depth 
Values Gini 10 
index 


Performance Matrices 

The performance matrices used in this research include: 

Accuracy. Accuracy is the percentage of right predictions made after the model being 
tested. The accuracy of a classification model is determined based on its confusion matrix. 
It is used to obtain a general evaluation of the model given a balanced dataset. 
Classification error. It is the percentage of incorrect predictions made by a classifier 
where the incorrect predictions are the sum of true positives and false positives. 
Weighted mean recall. The recall is the measure of positive instances that are predicted 
as positively corrected. The weighted mean of recall is the average of recall with weights 
that are equal to a class’s probability. 

Weighted mean precision. Precision is measured to determine confidence in the 
performance of the applied model. It is the probability of correctly predicting a positive 
instance. Weighted mean precision considers the weight equal to the class probability into 
consideration. 

Kappa. Cohen’s kappa is used to measure how closely instances classified by a machine 
learning algorithm are identical to the ground truth. The value for this statistic ranges from 
0 to 1 where 0 represents total disagreement while 1 represents complete agreement. [17]. 
Generally, it is considered a more robust gauge as compared to basic percent agreement 
measurement. 
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Logistic Loss. It is the negative average of a log of accurately predicted probabilities and 
indicates the extent to which the prediction probability is similar to its respective actual 
value. 
Root Mean Squared Error (RMSE). RMSE is an absolute measure of fit that indicates 
the success of the prediction of a model. 
Auto model in RapidMiner. RapidMiner Studio also facilities the accelerated method of 
developing and validating models through its extension known as the Auto Model. This 
feature can address three main problem classes: Prediction (classification and regression), 
clustering, and outlier detection. After the preprocessing and data mapping of the NSL- 
KDD, Auto Model provides a selection of DT, RF, and GBT. Suitable parameters for 
each model were selected along with feature engineering and optimization techniques. 
a: RESULTS 

The performance matrices values of each classifier are separately mentioned in 
Tables 6 to 7, for KNN, RF, GBT, and DT, respectively. The class precision and the recall 
for normal, DoS, R2L, probe, and U2R evaluated on KNN, RFF, GBT, and DT are 
illustrated in Figure 2 to Figure 3, respectively. 
Table 6. Performance Evaluation of KNN 


Performance matrices Values for KNN 
Accuracy (%) 98.70 
classification error (°/) 1.30 
Weighted mean recall () 81.43 
Weighted mean precision 91.63 
(%) 
Kappa 0.978 
Logistic loss 0.319 
RMSE 0.1 
true normal true DoS true R2L true Probe true U2R class precision 
pred. normal 30577 45 64 82 30 99.28% 
pred. DoS 59 21227 2 188 1 98.84% 
pred. R2L 65 2 1471 11 5 94.66% 


pred. Probe 117 81 15 5350 3 96.12% 
pred. U2R 4 0 0 0 9 69.23% 
class recall 99.21% 99.40% 94.78% 95.01% 18.75% 


Figure 2. Recall and Precision percentages for each class for KNN 
Table 7. Performance Evaluation of RF 


Performance matrices Values for RF 
Accuracy (%) 93.63 
Classification error (%) Got 
Weighted mean recall (%) 25,91 
Weighted mean precision 57.20 
(%) 
Kappa 0.889 
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Logistic loss 0.364 
RMSE 0.256 
true normal true DoS true R2L true Probe true U2R class precision 
pred. normal 30700 1262 1513 672 48 89.78% 
pred. DoS 8 20090 1 123 0 99.35% 
pred. R2L 0 0 0 0 0 0.00% 
pred. Probe 114 3 38 4836 0 96.89% 
pred. U2R 0 0 0 0 0 0.00% 
class recall 99.60% 94.08% 0.00% 85.88% 0.00% 


Figure 3. Recall and Precision percentages for each class for RF 
‘Table 8. Performance Evaluation of GBT 


Performance matrices Values for GBT 
Accuracy (%) 99.42 
Classification error (/) 0.58 
Weighted mean recall (%) 89.28 
Weighted mean precision (/) 035,09 
Kappa 0.990 
Logistic loss 0.456 
RMSE 0.453 
true normal true DoS true R2L true Probe true U2R class precision 
pred. normal 34504 23 30 62 8 99.64% 
pred. DoS 26 23989 2 27 0 99.77% 
pred. R2L 47 0 1690 10 16 95.86% 
pred. Probe 97 1 7 6234 2 98.00% 
pred. U2R 0 1 7 2 28 73.68% 
class recall 99.51% 99.85% 96.79% 98.41% 51.85% 


Figure 4. Recall and Precision percentages for each class for GBT 
Table 9. Performance Evaluation of DT 


Performance matrices Values for DT 
Accuracy (%) 97.26 
Classification error (%) 2.74 
Weighted mean recall (%) 85.84 
Weighted mean precision (/) 88.56 
Kappa 0.954 
Logistic loss 0.329 
RMSE 0.163 
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true normal true DoS true R2L true Probe true U2R class precision 
pred. normal 29997 22 44 6 0 99.76% 
pred. DoS 104 21051 0 256 0 98.32% 
pred. R2L 336 0 1464 125 24 75.12% 
pred. Probe 385 282 39 5244 2 88.10% 
pred. U2R 0 0 5 0 22 81.48% 
class recall 97.32% 98.58% 94.33% 93.13% 45.83% 


Figure 5. Recall and Precision percentages for each class for DT 
Figure 6 shows the comparison of classification accuracies for the four models. 
100 


98 
96 
94 
92 


90 
KNN RF GBT DT 


@ Accuracy 


Figure 6. Comparison of accuracies for ML Algorithms 
Figure 7 and 8 presents the class-wise comparison of precision and recall for ML models. 
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Figure 7. Comparison of Class wise Precision for ML Algorithms 
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Figure 8. Comparison of Class wise Recall for ML Algorithms 
Results from Auto Model 


Figure 9 shows the overall performance of the three machine algorithms: DT, RF and 
GBT. 


Accuracy Runtimes (ms) 


100.0% 
75.0% 


«* i se es «2° os a 
r “« a oF « Pa 
Accuracy bg | Model Accuracy Standard Deviation Gains Total Time Training Time 
Decision Tree FH 624% +0.2% 9,034 4min29s 23ms 
Random Forest 95.9% +0.1% 34,370 7min55s 1s 
Gradient Boosted Trees 2 $ 97.1% +0.1% 32,380 8min53s 2s 


Figure 9. Overview of Classifier’s performance in Auto Model 
Figure 10 to 11 show the values of performance matrices and confusion matrix for DT, 
RF and GBT implemented in Auto Model. 
Insights on recall and prediction for each class can be easily extracted from the confusion 
matrices generated by the Auto Model. 


Decision Tree - Performance 


Accuracy 


Classification Error 


Confusion Matrix 
true normal 


pred. normal 20981 


pred. DoS 917 
pred. R2L 37 
pred. Probe 39 
pred. U2R i) 


class recall 95.48% 


62.4% 


37.6% 


true DoS 
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16 


27 


oO 
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+0.2% 


+0.2% 
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0.00% 
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Figure 10. Performance Matrices and Confusion Matrix for DT in Auto Model 
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Random Forest - Performance 
Accuracy 95.9% 201% 


Classification Error 41% 20.1% 


Confusion Matrix 


tue normal true Dos tue R2L true Probe tue U2R class precision 
pred. normal 17967 183 76 82 4 98.13% 
prea. DoS 226 12345 18 54 1 97.64% 
pred. R2L 193 254 834 149 20 57.52% 
pred. Probe 185 4 6 3103 o 94.09% 
pred. U2R 3 o ° 0 10 76.92% 
class recall 96.73% 96.55% 89.29% 91.59% 31.25% 


Figure 11. Performance Matrices and Confusion Matrix for RF in Auto Model 


Gradient Boosted Trees - Performance 
Accuracy 97.1% 201% 


Classification Error 29% 201% 


Confusion Matrix 


true normal true DoS true R2L true Probe true U2R class precision 
pred. normal 18357 43 123 S516 6 96.39% 
pred. DoS 83 12817 2 a 1 99.02% 
pred. R2L 40 ° 779 110 9 83.05% 
prec. Probe 25 4 12 2728 1 98.48% 
pred. U2R 0 1 2 °o 4 82.35% 
class recall 99.20% 99.63% 84.85% 80.35% 45.16% 


Figure 12. Performance Matrices and Confusion Matrix for GBT in Auto Model 
4. DISCUSSIONS 

For each model, accuracy, absolute error, weighted mean recall, weighted mean 
precision kappa values, logistic loss, and RMSE are calculated. The comparison of these 
values aids in evaluating the performance of the machine learning algorithm. The 
confusion matrix provides a summary of the prediction results for all the classes by a 
classification model. 

Statistical Findings 

The weighted mean recall for GBT (89.28%) is also the highest while the 
weighted mean recall for the RF is the lowest (55.91%). The weighted mean precision is 
highest in the case of GBT (93.39%) while lowest for RF (57.20%). The value for Cohen's 
kappa coefficient (IS) is also highest for GBT (0.990) but lowest for RF (0.889%). The 
highest logistic loss is observed in the GBT model (0.456) while KNN has the lowest 
logistic loss (0.319). The lowest RMSE is observed for GBT (0.453) while the KNN has 
generated the highest value of RMSE (0.1). 

It has been observed that the DoS attack has been precisely identified by all the 
classifiers. However, the RF classifier did not identify the R2L or U2R attacks. The class- 
wise precision and recall comparison for each machine learning algorithm is shown in 
Figure 3.6, and Figure 3.7, respectively. In addition to this, all the classifiers have low 
accuracy and precision score for the U2R class. It is potentially because only 2% of the 
dataset contains instances of R2l, U2R and PROBE make up 2% of the dataset 
collectively. 

Comparing the accuracies for the machine learning algorithms, it is observed that 

the highest accuracy rate (99.42%) has been attained through the GBT model while RF 
provides the lowest accuracy (93.63%). 
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Findings from Auto Model 

The models built on Auto Model show similar results as GBT has outperformed 
RF and DT with the highest accuracy of 97.1%. DT offers accuracy of 62.4% and 
classification error of 37.6%. Meanwhile, RF has an accuracy of 95.9% and classification 
error of 4.1%. 

The model shows that DT has highest class precision for Probe (90.79%) and 
highest class recall for DoS (99.63%). RF has highest class precision for normal (98.13%) 
and highest class recall for normal (96.13%). GBT has highest class precision for Probe 
(98.48%) and highest class recall for normal (99.20%). The class precision and recall for 
U2R is lowest among all classes for all models, except for class precision for RF, where 
it is second lowest. 

By CONCLUSION AND FUTURE RECOMMENDATIONS 

As a result of increased connectivity between computers, the implementation of 
intrusion detection has become vital for secure networks. Researchers have designed 
models like classification and clustering through machine learning algorithms like Naive 
Bayes, logistic regression, RF, and SVM on NSL-KDD. The KDD dataset approximately 
includes 9% DOS attacks and 19% normal packets, while R21, U2R, and PROBE make 
up 2% of the dataset collectively. This research paper discusses the classification 
performance of the four machine learning algorithms that include KNN, Random Forest, 
gradient boosted tree, and decision tree on the NSL-KDD dataset. Based on this 
reseatch, GBT has outperformed all the other classification algorithms in the designs 
built and the Auto Model feature. GBT provided the highest accuracy (99.42%) while 
random forest algorithms achieved the lowest accuracy (93.63%). Moreover, it is found 
that it is more convenient to implement the machine learning models, especially on 
Rapid-Miner through Auto Model. This method is not only time-efficient and compact 
but also reduces the burden of implementing models via complex syntax. Different 
matrices including Accuracy, Absolute error, weighted mean recall, weighted mean 
precision, and Kappa are computed. All of these machine learning classifiers offer an 
accuracy on the NSL-KDD up to an acceptable extent. In the future, the latest available 
datasets like the CIC-Bell-DNS-EXF-2021 dataset can be used to evaluate the machine 
learning algorithms. Other ensemble models and deep learning algorithms can also be 
tested on the newest dataset. 
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